卷积神经网络可以在语义细分任务中实现出色的性能。但是,这种神经网络方法在很大程度上依赖于昂贵的像素级注释。半监督学习是解决这个问题的有前途的决议,但其表现仍然远远落后于完全受监督的对手。这项工作提出了一个带有三个模块的跨教师培训框架,可显着改善传统的半监督学习方法。核心是跨教师模块,可以同时减少同伴网络之间的耦合以及教师和学生网络之间的错误积累。此外,我们提出了两个互补的对比学习模块。高级模块可以将高质量的知识从标记的数据传输到未标记的数据,并在特征空间中促进类之间的分离。低级模块可以鼓励从同伴网络中的高质量功能学习的低质量功能。在实验中,跨教师模块显着提高了传统的学生教师方法的性能,而我们的框架在基准数据集上的表现优于现行方法。我们的CTT源代码将发布。
translated by 谷歌翻译
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
学习灵巧的操纵技巧是计算机图形和机器人技术的长期挑战,尤其是当任务涉及手,工具和物体之间的复杂而微妙的互动时。在本文中,我们专注于基于筷子的对象搬迁任务,这些任务很常见却又要求。成功的筷子技巧的关键是稳定地抓住棍棒,这也支持精致的演习。我们会自动发现贝叶斯优化(BO)和深钢筋学习(DRL)的身体有效的筷子姿势,它适用于多种握把的样式和手工形态,而无需示例数据。作为输入,我们要移动发现的抓紧姿势和所需的对象,我们构建了基于物理的手部控制器,以在两个阶段完成重定位任务。首先,运动轨迹是为筷子合成的,并处于运动计划阶段。我们运动策划者的关键组件包括一个握把模型,以选择用于抓住对象的合适筷子配置,以及一个轨迹优化模块,以生成无碰撞的筷子轨迹。然后,我们再次通过DRL训练基于物理的手部控制器,以跟踪运动计划者产生的所需运动轨迹。我们通过重新定位各种形状和尺寸的对象,以多种诱人的样式和多种手工形态的位置来展示框架的功能。与试图学习基于筷子的技能的香草系统相比,我们的系统实现了更快的学习速度和更好的控制鲁棒性,而无需抓紧姿势优化模块和/或没有运动学运动计划者。
translated by 谷歌翻译
到目前为止,大多数现有的STEG共分类方法都是为灰度图像设计的,并且它们不适合广泛用于当前社交网络的彩色图像。在本文中,我们在空间和JPEG域中设计了一个通用彩色图像隐星分析网络(名为UCNet)。该方法包括预处理,卷积和分类模块。为了在预处理模块中保留每个颜色通道中的隐写作道,我们首先将输入图像分为三个通道,根据相应的嵌入空间(即用于空间隐写术和JPEG隐写术的YCBCR的RGB),然后用62提取图像残差固定的高通滤波器,最后连接所有截断的残差进行后续分析,而不是将它们与正常卷积一起添加,如现有的基于CNN为基于CNN的steganalyzers。为了加速网络融合并有效地减少参数的数量,在卷积模块中,我们仔细设计了三种类型的层,具有不同的快捷方式连接和组卷积结构,以进一步学习高级落地特征。在分类模块中,我们采用全局平均池和完全连接的分类层进行分类。我们对阿拉斯加II进行了广泛的实验,以证明该方法可以在空间和JPEG结构域中的现代CNN的基于CNN的斯托格莱克(例如,SRNET和J-YENET)相比,可以实现最先进的结果,同时保持相对很少的内存要求和培训时间。此外,我们还提供必要的描述和许多消融实验,以验证网络设计的合理性。
translated by 谷歌翻译
我们介绍了一个名为Abess的新图书馆,该库实现了一个统一的框架,这些框架是解决多样化的机器学习问题,例如线性回归,分类和主要组件分析。特别是,在线性模型下,Abess认证在多项式时间内获得最佳解决方案。我们的有效实现使Abess能够快到或什至20倍的最佳选项选择问题的解决方案比现有竞争变量(模型)选择工具箱快20倍。此外,它支持常见变体,例如最佳组子集选择和$ \ ell_2 $正规化的最佳选项选择。库的核心在C ++中编程。为了易于使用,Python库设计用于便利地与Scikit-Learn集成,并且可以从Python库索引中安装。此外,还可以在综合的R存档网络上获得用户友好的R库。源代码可在以下网址获得:https://github.com/abess-team/abess。
translated by 谷歌翻译
我们使用从环境物体中提取的语义标志物,用于具有固定固定单眼相机的地面机器人,提出了一种视觉教学和重复(VTR)算法。所提出的算法对摄像机/机器人的起始姿势的变化具有鲁棒性,其中姿势定义为平面位置以及垂直轴周围的方向。 VTR由一个教学阶段组成,其中机器人在规定的路径中移动,以及一个重复阶段,在该阶段中,机器人试图从相同或其他姿势开始重复相同的路径。大多数可用的VTR算法是姿势依赖性的,并且从远离教学阶段的初始姿势开始时,在重复阶段无法表现良好。为了实现更强大的姿势独立性,关键是在教学阶段生成包含摄像头轨迹和周围物体位置的环境的3D语义图。对于特定的实现,我们使用Orb-Slam收集相机姿势和环境的3D点云,而Yolov3则检测环境中的对象。然后,我们组合两个输出以构建语义图。在重复阶段,我们基于检测到的对象和存储的语义映射重新定位机器人。然后,机器人能够朝教学路径移动,并在向前和向后重复。我们已经在不同的情况下测试了所提出的算法,并将其与两项最相关的研究进行了比较。另外,我们将算法与两种基于图像的重新定位方法进行了比较。一个纯粹基于球形 - 萨克,另一个纯粹是结合了超级胶水和兰萨克。结果表明,我们的算法在姿势变化和环境改变方面更加强大。我们的代码和数据可在以下github页面上获得:https://github.com/mmahdavian/semantic_visual_teach_repeat。
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译